t ■ 



DOCORBHT BBSORS 



BP 103 «69 

AOTBOR 
TlTtE 

POB OKVB 
HOTE 



DESCRIPTORS 



ipERTIFIERS 



TB 00« 314 



J^ftbrosino* Robert J*; And others 

Iteas and Instruction Evaluated Osing partitioning 

Procedures. 

[lug 711] 

16p.; Paper presente4 At the Annual Heeting of the 
Aaerican Psychological Association (82nd, Rev 
Orleans, touisiana, August 1974) 

SF-$0.76 HC-$1.58 PLUS POSTAGE 
Achieveaent Tests; Cluster Analysis^; *clttster 
Grouping; Goodness of Pit; ♦Grouping Procedures; 
Student Evaluation X 
Hierarchical Clustering Analysis; Partitioning 
Procedures; Test Itefts 



ABSTRACT 

Two studies vere undertaken to demonstrate the 
usefulness of partitioning procedures for studying test iteas* 
Achieveuent test iteas in five content areas of educational 
■easttreaent were used as stimuli to be sorted by groups of students 
with varying levels of sophistication nith the content, with the 
hypothesis that sorting by classes with greater sophistication would 
agree aore with siaulated target sortings than sortings by classes 
with less sophistication. These sortings were analyzed using 
partitioning procedures. Results froa both studies indicated that 
degree of sophistication in leasureaent was overall a potent variable 
in the sorting. In addition, several misconceptions among the 
students concerning the content undet st^idy were revealed. It was 
noted that a moderate nuaber of students enrolled in upper-level 
measurement courses demonstrated what amounted to errors in knowledge 
in their sortings. It was concluded that the partitioning procedures . 
were useful for studying how iteas are perceived by students and for 
determining how students organise content. (Author) 



ERLC 



BEST COPY AVAILABLE 



ITaiS AND IWCTRUCTIOM EVALUATED 
USING PARTITIOpIING PROCEDURES 



Robert J. Anbrosloo 
Albany Rfijjlonal Medical Program 



Robert F. Md'iorrls Lorraine K. Moval 
State University of Aew York 
at Albany 



U S OEPAftTMEftft OF HEALTH 
EDUCATION *vy£i.rAfie 
NATJOWAL INSTITUTE O^^ 
EDUCATION 

STATE R Lk/ SuT M r f s .Ak. V ^IMRt 
Sf N T ► iC > A, ' ON/.» »SS* I Tu^f Of 

i Di'f A ».f, r»., • CiN ofcr f'u, y 



CO 

o 



Paper presented at the eighty-second 
annual convention of the American^ 
Psychological Association, New Orleans, 
August, 1974. 




6^ 



BEST m AVAIUI6LE 



lTli!>IS lUSTRUCTIOU EVALUATED 
USIWG P^\RTITIONIi;a PROCEDURES 

Abstract 

Two ptudl'^s were utiderta!;ea to d'imonstrate the ueefuXness of partitioning 
procedures for studying test items. Achi^veiaeat test Items in five coatent 
areas o£ educational lacaaurefl^at were used as stimuli to be sorted by groups 
of students with varying levels of sophistication with the content* with the 
hypothesis that sortins^ hy classes with greater sophistication would agree 
wore with simulated target sortings than sortings by classes with less so- 
phistication. These sortings ware analyzed using partitioning procedures. 
Results froiii both studies indicated that degree of sophistication in measure- 
ment was overall a potent variable in the sorting* In addition, several 
misconceptions among the students concerning the content under study were 
revealed. It was noted that a moderate number of students enrolled in upper- 
level measurement courses demonstrated what amounted to errors in Imowlodge 
in their sortings. It was concluded that the partitioning procedures were 
useful for studying how items are perceived by students and for determining 
how students organize content. 



ITEIS AWD IHSTRUCTIOW EP/AUJATED 

lisiNG fMHTvaoiaaG procedures 

» 

A variet> of empirical approaches have been use4 for studying 
tfest Ueas, rawglng from item dtscrlmlQatlpn Indices to liptent 
trait methods. Typically, ouch approaches have relied on idata 

« 

from teatees* answers to the items. Tne empirical data for the 
present otody, however, are based on t^ortings of items, where each 
respondent clustered the items accordisig to his own percept ions ♦ 
These sortings were analyzed using partitioning procedures* 

* » 

In this study achievement te^t items wera used as stimuli to 
be sorted by groups of otudvJnts having differing levels of sophistlca 
tion with the content. It was hypothesizfid that the Sortings by 
members of those classes with greater sophistication would agree 
more with simulated* target sortings than would sortings by members 
of classes with less sophistication. -Other intents of the study, 
included evaluating the methodology as a procedure for studying 
how items are perceived by students and for determining how students 
orpanlze content. 

I-Iethod 

Classes were used with varying levels pt *so|^hii»tlca.tion in 
measurement: high school students^ undergraduates enrolled in an 
educational psychology course (EPSY ^200), and in a pupil evaluation 



4 



\ 



coursQ (EPSY 4A0), and grWcace studento enrolled in a pupil eval- 
uation course (ETSY 540), ;an educational ard psychological measure- 
ment course (EPSY 640), and in a more advanced measurement seminar 
(EPSY 744). For the first of tm applications of the methodology, 
131 students sorted the itfims. 

Thirty multiple-choice achievement test it<ims were uoed in 
the content areas of correlation, validity, reliability* and standard 
error of measurement. Item statistics available from previous test- 
ings indicated a moderate range of item difficulty and discrimina- 
tion Coefficients. Also, test items were initially selectea with 
reference to Bloom's (1956) Taxonomy^ of Educational Objectives; 
four of the six major categories in the cognitive domain were rep- 
resented in this selection. 

Each student was supplied with an envelope containing test 
items on individual slips of paper, several paper clips, and a 
piece of paper on which the student was requested to indicate his 
basis for sorting. The student was instructed to sort the items 
Into between three and nine categories and to indicate the basis 
for sorting that he used. 

The sortings were analyzed using the methods of latent parti- 
tion analysis (Wiley, 1967) and hierarchical clustering analysis 
(Uartigan, 1972; Johnson, 1967) in a manner similar to that described 
by Prwsek and^Ffeiffer (1973) and Fruzek, Stegman and Pfeiffer 



BEST COM MWIMLE 

(X972). The reader is referred to the latter report £or a dlscuasion 
of the algebra involved in the clustering procedures used in this 
Study* In essence^ che goal was to loeasure the goodness of fit 
of any single partition of the 30 Items to a fixed target parti** 
tion, which corresponded to the investigators* hypothesis about 
the cue system vdilch the sort^rs should isost likely use in parti- 
tioning the items. 

rianil'est partitions for each class were analyeed with respect 
to an a priori < arget partition based on the content area covered 
by the iteiii. The following item<^content distribution was hypoth- 
esized: correlation - 9 items^ validity - 7 items^ reliability 
- 3 items^ standard error of c^asuremenc - 4 items, and the rela-* 
tionshlp between validity and reliability - 2 items* 

In this stuly the qgj. statistic was used as a measure of good- 
uess of fit for these data. A small value of this statistic, which 
has a range from 0 to 1» is taken as evidence that ''the target in 
question can reasonably be regarded as having been the model in 
some sense for an individual's manifest partition (Prusek* Stegman 
and Pfeiffer, 1972, p. 7). 

Results: Study A 

Table 1 contains mean q values as well as standard deviations 

St 

for each class, derived using the target par^tion based on item 



content. Av(*rage q values r.re Dimply unv/eighted means computed 
across all class miuabcrss end arc taken as a sumiiiary in J ex of 
goodness of fit for each class. 



Insert Table 1 about here 



As can be seen, the averase q values are highest for the 

3t 

two groups with least sophistication (Grade 11 and EPSV 200)^ 
and lowest for the EPSY 744, the oost sophisticated group. Results 
were nearly Identical for the three groups with soita sophistication, 
i.e., EPSY 440, 540, and 640. 

Table 2 includes results obtained from a comparison of inter- 
group q mean values using Duncan's Mew Itultiple Range Test (Duncan, 

St 

1955 i Cramer, 1956). The reader will note that significant dif- 
ferences were observed for eleven of the fifteen comparisons. There 
were no significant differences observed between ^^^s based on the 
Grade 11 and EPSY 200 data, and for comparisons made among q^^s 
based on the EPSY 440, EPSY 540, and the EPSY 640 data. 

Insert Table 2 about here 

Many students responded to labels such as the term "reliability' 
in the item stems as an aid to sorting, as could be seen from the 
sujanaries of the sortings from self-reported replies to our request 



-5 



for the basis for corcing, as uell as from laformnl discussions 
wich students ^o had coxplencd the task* For Items were such 
labels ware not available^ the sophistication of the group was a 
more potent variable in the sorting. Cues within the alternatives 
did not seem to have been Important to t\i< ''orters. 



It was Judged that results of the initial sortings were 
confounded by the presence of labels in the item stems » and the 
original set of items was revised to minimize such cueing by labels « 
Specifically, nineteen of the thirty items were revisod, with inten- 
tion to alter cnly the cues in the stems* Care was taken in this 
revision not to alter significantly the original item difficulty 
and discrimination levels and to maintain as closely as possible 
the original dlotributlon of items as they related to Bloom^s (1956) 
Tajpnomy of Educational Objectives* 



The process was replicated using a similar f^^ple of students 
with varying levels of sophistication in useasurement* Included in 
the 135 students were high school students and members of four 
out of the &ive university courses represented in Study A* 



The data were first analyzed uolng the a priori target spec- 
ified for Study A. Table 3 contains means and standard deviations 



Method: Study B 




Results: Study B 



ERJC 



8 



-6- 



of the q values for each clasa uding this target. A detailed 
exatoinatioa of Individual partitions revealed in some classes that 
many sorters based their partitions on other than conventional 
sorting strategies, such as length o£ item stem, key answer, and 
the like. Partitions such as these were classified as outliers 
and were excluded from further analysis. Specifically, thirty- 
eight of - the 135 flortings ucre classified as outliers. 

The average q s contained in Table 3 are consistently higher 
across the various classes than those contained in Table 1* Since 
the cotaposition of the classes and currlculat content were £un'^ 
damentally the same for each eirperitnent, it was concluded that thesa 
differences were largely attributable to the cueing by labels dis- 
covered in the initial experlnu:ot. With the exception of the Grade 
11 data the manifest partitions more nearly approximated the target 
partition as Che sophistication of the class increased^ i«e«i for 
these data sophistication of the group appeared overall to be a 
potent variable in the sortings even within the relatively homoge- 
neous subset of classes* 

Insert Table 3 about here 



A comparison of intergroup q mean values was also made for 

St 

these data. Table 4 includes results obtained from a coaparison 
of intergroup ^^^a using Dtmcan's New Multiple Range Test. As can 



ERIC 



3 



BEST COW ftVMlMlE ^ 

« « 

be aeen, significant: differences were observed for seven of the 
ten comparisons. Uo signlfJcant differences were observed between 
q .8 based on Grade H and EPSY 200 data, Grade 11 and EPSY 440 
data, and EPSY SAO and EPSY 640 data. 



Inccrt Tabic 4 about here 



Data for each cla&s were teanaly2ed using derived targets gen- 
erated by the initial clustering procedures » with the Intent of 
further refining the results* The a priori target and the derived 
hlurprchical clustering target were practically identical fur each 
class, as were the wean q^. values derived using these targets # 

St ' 

Results of these comparisons, which failed to improve the accuracy 
of initial results, are not included in this report. 

A moderate number of sorters based their partitions on Bloom's 
(1956) Taxonomy of Educational Objectives: Cognitive Domain. A 
second target partition based on this classification scheoe was 
constructed for analyzing this subset of data. Analysis of these 
data using this subsequent target resulted in an extremely poor 
fit, however, and further presentation of the findings is not 
included in this report. 

Some misconceptions among the students concerning the content 
were suspected. Comments made by Ss relative to their sorting 



o 

ERIC 



10 



scrat£gi(28 were reviewed und two-way contingency tables comparing 
the a priori target partition and the dcriv?.d hierarchical cluster irg 
analysis partitions were constructed for each class with the pur- 
pose o£ detecting these errors. 

To illustrate, for several groups an item based on expectancy 
tables was wot associated with the validity items as expected. 
One tnight quection then whether the concept of expectancy tables 
was adequately understcod. 

Two other misconceptions may be noted as illustrative. Several 
persons sorted items based on reliability into a category which 
they labeled correlation. It appears for these sorters that a 
linU-ted conceptualization of the notion of reliability had been 
foriusd. In a similar fashion, others sorted it^as based on criterion- 
related validity into the same correlation category. 

Sutuoary and Discussion 

Two studies were undertaken to demonstrate the usefulness of 
partitioning procedures for studying test items. Achievetaent test 
items in the contcLi areas of correlation, validity, reliability 
and standard error of measurement were used as stimuli to be sorted 
by groups of students wich varying levels of sophistication with 
the content, with the hypothesis that the sortings by members of 
those classes with greater sophistication would agree more with 



11 



simulated target sortings rhm would sortings by wetubers of classes 
with less sophistication* 

Findings from tha first study indicated that sophist icat ion 
of the group in measurement was a reasonably potent variable In 
the sorting. Subjects frequently responded to labels in the Item 
stems as an aid to sorting, however, and thus failed to syatomatically 
apply their knowledge of the content to the sorting task. 

The original set of iteics was revised to minimise such cueing 
by labels and the nftperiiueal: was replicated using a similar sample 
of subjects. Results from tha second study conflraed that degree 
of sophistication in measurement was overall a potent variable 
in the sorting. 

Inspection of tw-way contingency tables conparing the a priori 
targftt partition and the derived hierarchical clustering analysis 
partitions revealed several misconceptions among the students con- 
ceming the content under i<{tudy. In this context^ it was noted 
that a moderate number of students enrolled in upper^^level measure- 
ment courses demonstrated what amounted to errors in knowledge in 
their sortings. Further, some content topics were apparently not 
well understood. 

Numerota misconceptions involving %he use of Bloom's (1956) 
Taxonomy of Educational Objectives: Cognitive Domain as a basis 



KST con AVAllABLE 

for sorting the items were also noted. The majority or sCtidents 
vho used this j^radigm a? a oorting strategy appeared to have 
laastered a )cnowledge of the category labels but failed to demonstrate 
an indapth understanding o£ the Taxonomy. 

The procedures used in this study proved to be useful for 
studying-how item are perceived by students and for determining 
how students organise content, ttesults suc^ as those reported above 
seem to have value as a moans of feedback to an instructor regarding 
the way in which his students perceive a given test and the cor- 
responding course content. Such information has the potential for 
Improving the teaching-^Xearnlng process* 

Further studies might include an investigation o£ the relation- 
ship between the goodness of fit of sorting data and selected organiwaic 
variables such as aptitude and achievenent* 



/ 



References 



Bloon, B.S. (Ed.) lirr-Z^ "--""-^^ '^"^^ ^ ' ?off..itive d oi g aln. 
New York: David McS^y, 1955. 

Cramer. C.Y, Extea.ion of tauUiple range teets to group means with unequal 
nutabers of replications. Bioaotrlca . 1956, 12, 307 su , 

Duncan. D.B. Multiple range and niultiple F tests. Biometric s^ 1955,' 11. 
1-42. 

Uartigan. J.A. Dirpct cl..tering of a <^Jt a matrix. J o nmal of the America n 
StitiBtical AagQciar.lcn . 1972, 67, 123-l^i»* 

Johnson. S.C. Hierarc hical cl u«terin«. schemes,. Psychoaetrika, 1967, 32. 
241-254; 

Pruzek. and Pfciffer. R.A. An illustration of an approach to^alyeing 

Mrtitioned data in ihc context of educational measurement. Paper 
pres^tSlt the annual convocation of the Northeastern Educational 
Research Aesociation, Boston, Noveiaber, 1972. 
oruzek R.M.; Stegman, C.A. and Pfeiffer, R.A. On the analysis of 
-'"^'Sirtltiined d^.' Paper presented at the ^[.f^t 
American Educational Research Association, C3ilcago, March, 197Z. 

Wiley, O.E. Latent partitton analysis. PaYChometrika. 1967. .32, 183-192. 



14 



I 

«4 



TABLE X 



Moans and Standard Deviations for 
q^s Derived Uairig Target Partition 
' Based on Itea Content: Study A 



Class 




' ^st 


SD (q^t) 


Grade 11 


39 


.269 


.068 


E?SY 200 


32 


.2S0 


.087 


SPSY 4A0 




.202 


.072 


EPSY 540 




.200 


.104 


EPSY 640 


26 


.203 


.068 


EPSY 744 


12 


.096 

1 , 


.069 



TABLE 3 



Means and Standard Deviations for 
q^^s Derived Using Target Partition 
tsed on Xtea Content: Study B 



Class 


H 


^st 


s» (qst) 


Grade 11 


19 


.340 




.052 


EPSY 200 


22 


.367 




.087 


EPSY 440 


16 


.308 




.056 


EPSY 540 


15 


.226 




•075 


EPSY 640 


25 

i 


.194 




.075 



erIc 



15 



BEST cm AVAILABLE 



Duncan's Values For Intergroup 
q^^ Comparisons: Study A 



Class Grade 11 EPSY 200 EPSY 440 EPSY 540 EPSY 640 EFSY 744 



Grade 11 
EFSY 200 
EPSY 440 
EPSY 540 
EFSY 640 
EPSY 744 



.064 








.329* 


.360* 






.359* 


.402* 


.008 




.399* 


.410* 


.004 


.014 


.734* 


.765* 


. 390* 


.406* .434* 



* p <^.05. 



TABLE 4 



Duncan's Values For Intergroup 
q ^ Comparisons: Study B 



Class ; 


Grade U 


EPSY 200 


EPSY 440 


EPSY 540 EPSY 640 


■ 

Grade 11 


«• 


\ 




/ 


EPSY 200 


.012 






/ 

1 

1 

V 


EPSY 440 


.133 


.254* 




EPSY 540 


1 .452* 


.592* 


.322* 


EPSY 640 


j .679* 

f 


.835* 


.503* 


.138 - ■ 



I 

* p^.05. 



16 



